Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

Given a Markov Decision Process (MDP) with n states and m actions perstate, we study the number of iterations needed by Policy Iteration (PI)algorithms to converge to the optimal γ-discounted optimal policy. We con-sider two variations of PI: Howard’s PI that changes the actions in all stateswith a positive advantage, and Simplex-PI that only changes the action inthe sta...

متن کامل

[hal-00829532, v2] Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

Given a Markov Decision Process (MDP) with n states and m actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal γ-discounted optimal policy. We consider two variations of PI: Howard’s PI that changes the actions in all states with a positive advantage, and Simplex-PI that only changes the action in the state with maximal advan...

متن کامل

[hal-00829532, v3] Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

Given a Markov Decision Process (MDP) with n states and m actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal γ-discounted optimal policy. We consider two variations of PI: Howard’s PI that changes the actions in all states with a positive advantage, and Simplex-PI that only changes the action in the state with maximal advan...

متن کامل

About upper bounds on the complexity of Policy Iteration∗

We consider Acyclic Unique Sink Orientations of the n-dimensional hyper-cube (AUSOs), that is, acyclic orientations of the edges of the hyper-cube such that any sub-cube has a unique vertex of maximal in-degree. We study the Policy Iteration (PI) algorithm, also known as Bottom-Antipodal or Switch-All, to nd the global sink: starting from an initial vertex π0, i = 0, the outgoing links at the p...

متن کامل

the effect of task complexity on lexical complexity and grammatical accuracy of efl learners’ argumentative writing

بر اساس فرضیه شناخت رابینسون (2001 و 2003 و 2005) و مدل ظرفیت توجه محدود اسکهان (1998)، این تحقیق تاثیر پیچیدگی تکلیف را بر پیچیدگی واژگان و صحت گرامری نوشتار مباحثه ای 60 نفر از دانشجویان زبان انگلیسی بررسی کرد. میزان پیچیدگی تکلیف از طریق فاکتورهای پراکندگی-منابع تعیین شد. همه ی شرکت کنندگان به صورت نیمه تصادفی به یکی از سه گروه: (1) گروه موضوع، (2) گروه موضوع + اندیشه و (3) گروه موضوع + اندی...

15 صفحه اول

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics of Operations Research

سال: 2016

ISSN: 0364-765X,1526-5471

DOI: 10.1287/moor.2015.0753